Workshop goals

What is R?

R is a software environment for statistical computing and graphics. Using R you can do rigorous statistical analysis, clean and manipulate data, and create publication-quality graphics.

clustering map

Popularity of R

Source: Stephen Cass, “The 2016 Top Programming Langauges”, IEEE Spectrum

R packages

Packages are programs that you import into R to help make tasks easier. The most popular R packages for working with data include dplyr, stringr, tidyr, and ggplot2.

How to find a package

There’s no easy way (yet) for new R users to find R packages that they might need. People are working on this problem. In the meantime, consult the following list or ask a Librarian!

Resources include:

Core R functions for plotting

You can create graphs in R without installing a package, but packages will allow you to create better visualizations that are any of the following:

ggplot2

ggplot2 is the most popular visualization package for R. It’s the best all-purpose package for creating many types of 2-dimensional visualizations.

Source:

highcharter

leaflet

plotly

Deciding on the right package

ggplot2: the most important package to learn first

ggplot2 was created on the principles of the Layered Grammar of Graphics (2010), by Hadley Wickham and based of off work from Wilkinson, Anand, & Grossman (2005) and Jaques Bertin (1983).

Essentially: graphs are like sentences you can construct, and they have a grammar. The grammar of graphics consists of the following:

at least one layer:

scale
coordinate system
facet (optional)

These components make up a graph.

Open script.R file

Open RStudio. Download the following file: script.R File > Open File…
Select the script.R file that you just downloaded
Click Open

Get to know the data

Let’s see an example of a simple graph created with ggplot. We are going to use the mpg data set about different cars and their properties.

Exercise #1: In your script file, run ?mpg to learn more about this dataset. To run the code, highlight it and then click Run. (shortcut keys: Mac: command + Enter, Windows: CTRL + Enter)

Exercise #2: Run head(mpg) to see the first few rows of the data.

## # A tibble: 6 x 11
##   manufacturer model displ  year   cyl      trans   drv   cty   hwy    fl
##          <chr> <chr> <dbl> <int> <int>      <chr> <chr> <int> <int> <chr>
## 1         audi    a4   1.8  1999     4   auto(l5)     f    18    29     p
## 2         audi    a4   1.8  1999     4 manual(m5)     f    21    29     p
## 3         audi    a4   2.0  2008     4 manual(m6)     f    20    31     p
## 4         audi    a4   2.0  2008     4   auto(av)     f    21    30     p
## 5         audi    a4   2.8  1999     6   auto(l5)     f    16    26     p
## 6         audi    a4   2.8  1999     6 manual(m5)     f    18    26     p
## # ... with 1 more variables: class <chr>

Exercise #3: ggplot syntax

The graph below uses ggplot2 to look for correlation between a car’s engine displacement and highway mileage.

**Run the following code in your script file:**

Exercise #4: Practice

Make a scatterplot with cyl mapped to the x-axis and hwy mapped to the y-axis.

Solution to #4

Exercise #5: Mapping a variable to color

Make a scatterplot of disp=x and hwy=y with class mapped to the color aesthetic. Run:

Exercise #6: Make the same scatterplot as the previous example, but map drv to color.

Solution to #6

The type of drive system the car has (4-wheel, rear-wheel, and front-wheel) is mapped to color.

Exercise #7: Aesthetic parameters

Variables can be mapped to the following aesthetic parameters. If you are publishing in b/w, and can’t use color, you might want to use size or shape:

Substitute another aesthetic in place of color. Run the code:

Exercise #8: Faceting

Facets are a way to create multiple smaller charts, or subplots, based on a variable. Run this code to see what faceting does:

Exercise #9: Practice Faceting.

Substitute class for another variable in the dataset. Ex: trans, drive, or cyl